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Cross -Reference to Related Applications 

This application claims priority under 35 USC §119 (e) (1) to 
provisional application Ser. No. 60/242,498 filed Oct. 23, 2000. 

1995. Field of the Invention 

The present invention relates to speech decoders, and more 
particularly to methods used to handle bad frames received by 



1CL speech decoders. 



Background of the Invention 



H. In digital cellular systems, a bit stream is said to be 

Nj transmitted through a communication channel connecting a mobile 
fi station to a base station over the air interface. The bit 
1© stream is organized into frames, including speech frames. 

Whether or not an error occurs during transmission depends on 
prevailing channel conditions. A speech frame that is detected 
to contain errors is called simply a bad frame. According to 
the prior art, in case of a bad frame, speech parameters derived 
20 from past correct parameters (of non-erroneous speech frames) 

are substituted for the speech parameters of the bad frame. The 
aim of bad frame handling by making such a substitution is to 
conceal the corrupted speech parameters of the erroneous speech 
frame without causing a noticeable degrading of the speech 
25 quality. 

Modern speech codecs operate by processing a speech signal 
in short segments, the above-mentioned frames. A typical frame 
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length of a speech codec is 20 ms, which corresponds to 160 
speech samples, assuming an 8 kHz sampling frequency. In so- 
called wideband codecs, frame length can again be 2 0 ms, but can 
correspond to 320 speech samples, assuming a 16 kHz sampling 
5 frequency. A frame may be further divided into a number of 

subf rames . 

For every frame, an encoder determines a parametric 

representation of the input signal. The parameters are 

quantized and then transmitted through a communication channel 

lOp in digital form. A decoder produces a synthesized speech signal 

~ based on the received parameters (see Fig. 1) . 
w 

A typical set of extracted coding parameters includes 

w 

y spectral parameters (so called linear predictive coding 

parameters, or LPC parameters) used in short-term prediction, 



o 

9 



15e parameters used for long-term prediction of the signal (so 

O 

C* called long-term prediction parameters or LTP parameters) , 

Ul various gain parameters, and finally, excitation parameters. 
Q 

O What is called linear predictive coding is a widely used and 

successful method for coding speech for transmission over a 

20 communication channel; it represents the frequency shaping 

attributes of the vocal tract. LPC parameterization 
characterizes the shape of the spectrum of a short segment of 
speech. The LPC parameters can be represented as either LSFs 
(Line Spectral Frequencies) or, equivalently, as ISPs (Immittance 

25 Spectral Pairs) . ISPs are obtained by decomposing the inverse 

filter transfer function A(z) to a set of two transfer functions, 
one having even symmetry and the other having odd symmetry. The 
ISPs, also called Immittance Spectral Frequencies (ISFs) , are the 
roots of these polynomials on the z-unit circle. Line Spectral 

30 Pairs (also called Line Spectral Frequencies) can be defined in 

the same way as Immittance Spectral Pairs; the difference between 
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these representations is the conversion algorithm, which 
transforms the LP filter coefficients into another LPC parameter 
representation (LSP or ISP) . 

Sometimes the condition of the communication channel 
through which the encoded speech parameters are transmitted is 
poor, causing errors in the bit stream, i.e. causing frame 
errors (and so causing bad frames) . There are two kinds of 
frame errors: lost frames and corrupted frames. In a corrupted 
frame, only some of the parameters describing a particular 
speech segment (typically of 20 ms duration) are corrupted. In 
a lost frame type of frame error, a frame is either totally 
corrupted or is not received at all. 

In a packet-based transmission system for communicating 
speech (a system in which a frame is usually conveyed as a 
single packet) , such as is sometimes provided by an ordinary 
Internet connection, it is possible that a data packet (or 
frame) will never reach the intended receiver or that a data 
packet (or frame) will arrive so late that it cannot be used 
because of the real-time nature of spoken speech. Such a frame 
is called a lost frame. A corrupted frame in such a situation 
is a frame that does arrive (usually within a single packet) at 
the receiver but that contains some parameters that are in 
error, as indicated for example by a cyclic redundancy check 
(CRC) . This is usually the situation in a circuit -switched 
connection, such as a connection in a system of the global 
system for mobile communication (GSM) connection, where the bit 
error rate (BER) in a corrupted frame is typically below 5%. 

Thus, it can be seen that the optimal corrective response 
to an incidence of a bad frame is different for the two cases of 
bad frames (the corrupted frame and the lost frame) . There are 
different responses because in case of corrupted frames, there 



is unreliable information about the parameters, and in case of 
lost frames, no information is available. 

According to the prior art, when an error is detected in a 
received speech frame, a substitution and muting procedure is 
5 begun; the speech parameters of the bad frame are replaced by- 

attenuated or modified values from the previous good frame, 
although some of the least important parameters from the 
erroneous frame are used, e.g. the code excited linear 
prediction parameters (CELPs) , or more simply the excitation 
10 parameters. 

• 

^ In some methods according to the prior art, a buffer is 

m 

used (in the receiver) called the parameter history, where the 
last speech parameters received without error are stored. When 
a frame is received without error, the parameter history is 
15 M updated and the speech parameters conveyed by the frame are used 
for decoding. When a bad frame is detected, via a CRC check or 
some other error detection method, a bad frame indicator (BFI) 
is set to true and parameter concealment (substitution for and 
muting of the corresponding bad frames) is then begun; the 
2 0 prior- art methods for parameter concealment use parameter 

history for concealing corrupted frames. As mentioned above, 
when a received frame is classified as a bad frame (BFI set to 
true) , some speech parameters may be used from the bad frame; 
for example, in the example solution for corrupted frame 
25 substitution of a GSM AMR (adaptive multi-rate) speech codec 

given in ETSI (European Telecommunications Standards Institute) 
specification 06.91, the excitation vector from the channel is 
always used. When a speech frame is lost (including the 
situation where a frame arrives too late to be used, such as for 
30 example in some IP-based transmission systems) , obviously no 

parameters are available from the lost frame to be used. 
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In some prior-art systems, the last good spectral 
parameters received are substituted for the spectral parameters 
of a bad frame, after being slightly shifted towards a constant 
predetermined mean. According to the GSM 06.91 ETSI 
specification, the concealment is done in LSF format, and is 
given by the following algorithm, 

For i=0 to N-l: 
LSF_ql (i) 

=a*past_LSF_q(i) + (1-a) *mean_LSF (i) ; (eg. 1.0) 

LSFjq2 (i ) =LSF_ql (i) ; 

where a = 0.95 and N is the order of the linear predictive (LP) 
filter being used. The quantity LSF_ql is the quantized LSF 
vector of the second subframe, and the quantity LSF_q2 is the 
quantized LSF vector of the fourth subframe. The LSF vectors of 
the first and third subframes are interpolated from these two 
vectors. (The LSF vector for the first subframe in the frame n 
is interpolated from LSF vector of fourth subframe in the frame 
n-l, i.e. the previous frame) . The quantity past_LSF_q is the 
quantity LSF_q2 from the previous frame. The quantity mean_LSF 
is a vector whose components are predetermined constants; the 
components do not depend on a decoded speech sequence. The 
quantity mean_LSF with constant components generates a constant 
speech spectrum. 

Such prior-art systems always shift the spectrum 
coefficients towards constant quantities, here indicated as 
mean_LSF(i) . The constant quantities are constructed by 
averaging over a long time period and over several successive 
talkers. Such systems therefore offer only a compromise 
solution, not a solution that is optimal for any particular 
speaker or situation; the tradeoff of the compromise is between 



leaving annoying artifacts in the synthesized speech, and making 
the speech more natural in how it sounds (i.e. the quality of 
the synthesized speech) . 

What is needed is an improved spectral parameter 
substitution in case of a corrupted speech frame, possibly a 
substitution based on both an analysis of the speech parameter 
history and the erroneous frame. Suitable substitution for 
erroneous speech frames has a significant effect on the quality 
of the synthesized speech produced from the bit stream. 

Summary Of The Invention 

Accordingly, the present invention provides a method and 
corresponding apparatus for concealing the effects of frame 
errors in frames to be decoded by a decoder in providing 
synthesized speech, the frames being provided over a 
communication channel to the decoder, each frame providing 
parameters used by the decoder in synthesizing speech, the 
method including the steps of: determining whether a frame is a 
bad frame; and providing a substitution for the parameters of 
the bad frame based on an at least partly adaptive mean of the 
spectral parameters of a predetermined number of the most 
recently received good frames. 

In a further aspect of the invention, the method also 
includes the step of determining whether the bad frame conveys 
stationary or non- stationary speech, and, in addition, the step 
of providing a substitution for the bad frame is performed in a 
way that depends on whether the bad frame conveys stationary or 
non-stationary speech. In a still further aspect of the 
invention, in case of a bad frame conveying stationary speech, 
the step of providing a substitution for the bad frame is 
performed using a mean of parameters of a predetermined number 



of the most recently received good frames. In another still 
further aspect of the invention, in case of a bad frame 
conveying non- stationary speech, the step of providing a 
substitution for the bad frame is performed using at most a 
5 predetermined portion of a mean of parameters of a predetermined 

number of the most recently received good frames. 

In another further aspect of the invention, the method also 
includes the step of determining whether the bad frame meets a 
predetermined criterion, and if so, using the bad frame instead 
10 of substituting for the bad frame. In a still further aspect of 

J the invention with such a step, the predetermined criterion 
•fl involves making one or more of four comparisons: an inter- frame 
fg comparison, an intra- frame comparison, a two-point comparison, 
W and a single -point comparison. 

15© From another perspective, the invention is a method for 

p concealing the effects of frame errors in frames to be decoded 
Nl by a decoder in providing synthesized speech, the frames being 

u 

P provided over a communication channel to the decoder, each frame 
%4 providing parameters used by the decoder in synthesizing speech 
2 0 the method including the steps of: determining whether a frame 

is a bad frame; and providing a substitution for the parameters 
of the bad frame, a substitution in which past immittance 
spectral frequencies (ISFs) are shifted towards a partly 
adaptive mean given by: 

25 ISF g (i) = a*past_ISF v (i) + (\-a)*ISF mean (i), for i = 0..16, 

where 

a = 0.9, 

ISF q (i) is the i th component of the ISF vector for 
a current frame, 
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P 



past _ISF q {i) is the i th component of the ISF vector 
from the previous frame, 

ISF mean (i) is the i th component of the vector that 

is a combination of the adaptive mean and the constant 
predetermined mean ISF vectors, and is calculated using the 
formula : 

W^W-P'W^^W + IX-frlSF^^M. for / = 0..16. 

where /3 = 0.75, where ISF adap , ve _ mean (i) = past _ISF q (i) and is 

-> j=0 



updated whenever BFI =0 where BFI is a bad frame indicator, 
and where ISF const mean (i) is the i th component of a vector 
formed from a long-time average of ISF vectors. 



Brief Description of the Drawings 



= ■ The above and other objects, features and advantages of the 

P invention will become apparent from a consideration of the 

f3 

15T = subsequent detailed description presented in connection with 
accompanying drawings, in which: 

Fig. 1 is a block diagram of components of a system 
according to the prior art for transmitting or storing speech 
and audio signal; 

20 Fig. 2 is a graph illustrating LSF coefficients [0 ... 4kHz] 

of adjacent frames in a case of stationary speech, the Y-axis 
being frequency and the X-axis being frames; 

Fig. 3. is a graph illustrating LSF coefficients [0 ... 4kHz] 
of adjacent frames in case of non- stationary speech, the Y-axis 
25 being frequency and the X-axis being frames; 
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Fig. 4 . is a graph illustrating absolute spectral deviation 
error in the prior- art method; 

Fig. 5 is a graph illustrating absolute spectral deviation 
error in the present invention (showing that the present 
5 invention gives better substitution for spectral parameters than 

the prior-art method) , where the highest bar in the graph 
(indicating the most probable residual) is approximately zero; 

Fig. 6. is a schematic flow diagram illustrating how bits 
are classified according to some prior art when a bad frame is 
10p| detected; 

Fig. 7 is a flowchart of the overall method of the 
** invention; and 

U Fig. 8 is a set of two graphs illustrating aspects of the 

g criteria used to determine whether or not an LSF of a frame 
13 s indicated as having errors is acceptable. 



Best Mode For Carrying Out The Invention 



p According to the invention, when a bad frame is detected by 

^ a decoder after transmission of a speech signal through a 
communication channel (Fig. 1) , the corrupted spectral 

20 parameters of the speech signal are concealed (by substituting 

other spectral parameters for them) based on an analysis of the 
spectral parameters recently communicated through the 
communication channel. It is important to effectively conceal 
corrupted spectral parameters of a bad frame not only because 

25 the corrupted spectral parameters may cause artifacts (audible 

sounds that are obviously not speech) , but also because the 
subjective quality of subsequent error- free speech frames 
decreases (at least when linear predictive quantization is 
used) . 
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An analysis according to the invention also makes use of 
the localized nature of the spectral impact of the spectral 
parameters, such as line spectral frequencies (LSFs) . The 
spectral impact of LSFs is said to be localized in that if one 
5 LSF parameter is adversely altered by a quantization and coding 

process, the LP spectrum will change only near the frequency 
represented by the LSF parameter, leaving the rest of the 
spectrum unchanged. 

The invention in general, for either a lost frame or a corrupt 
10 £3 frame 



According to the invention, an analyzer determines the 

spectral parameter concealment in case of a bad frame based on 

~ the history of previously received speech parameters. The 

P analyzer determines the type of the decoded speech signal (i.e. 

15^ whether it is stationary or non-stationary) . The history of the 

Nl speech parameters is used to classify the decoded speech signal 
Ixj 

q (as stationary or not, and more specifically, as voiced or not) ; 

P the history that is used can be derived mainly from the most 

ft : 

recent values of LTP and spectral parameters. 

20 The terms stationary speech signal and voiced speech signal 

are practically synonymous; a voiced speech sequence is usually 
a relatively stationary signal, while an unvoiced speech 
sequence is usually not. We use the terminology stationary and 
non- stationary speech signals here because that terminology is 

25 more precise. 

A frame can be classified as voiced or unvoiced (and also 
stationary or non- stationary) according to the ratio of the 
power of the adaptive excitation to that of the total 
excitation, as indicated in the frame for the speech 
30 corresponding to the frame. (A frame contains parameters 
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according to which both adaptive and total excitation are 
constructed; after doing so, the total power can be calculated.) 

If a speech sequence is stationary, the methods of the 
prior art by which corrupted spectral parameters are concealed, 
as indicated above, are not particularly effective. This is 
because stationary adjacent spectral parameters are changing 
slowly, so the previous good spectral values (not corrupted or 
lost spectral values) are usually good estimates for the next 
spectral coefficients, and more specifically, are better than 
10 Q the spectral parameters from the previous frame driven towards 

the constant mean, which the prior art would use in place of the 



H bad spectral parameters (to conceal them) . Fig. 2 illustrates, 



CO 

y 



for a stationary speech signal (and more particularly a voiced 
O speech signal) , the characteristics of LSFs, as one example of 
15_ spectral parameters; it illustrates LSF coefficients [0 ... 4kHz] 
of adjacent frames of stationary speech, the Y-axis being 
frequency and the X-axis being frames, showing that the LSFs do 
change relatively slowly, from frame to frame, for stationary 
speech . 



N 

Q 



2 0 During stationary speech segments, concealment is performed 

according to the invention (for either lost or corrupted frames) 
using the following algorithm: 

For i - 0 to N-l (elements within a frame): 

adapt i ve_mean_LSF_vector (i) 
25 = (past_LSF_good(i) (0) +past_LSF_good (i) ( 1) +...+past_LSF_good (i ) (K- 

D)/K; 

LSFjql (i) 

=a*past_LSF_qood(i) (0) + (l-a) * adapt ive_mean_LSF ( i ) ; (2.1) 
LSF_q2 (i ) =LSF_ql (i ) . 

30 where a can be approximately 0.95, N is the order of LP filter, 

and K is the adaptation length. LSF_ql (i) is the quantized LSF 
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vector of the second subframe and LSF_q2 (i) is the quantized LSF 
vector of the fourth subframe. The LSF vectors of the first and 
third subframes are interpolated from these two vectors. The 
quantity past_LSF_qood ( i ) (0) is equal to the value of the 
5 quantity LSF_q2 (i-1) from the previous good frame. The quantity 

past_LSF_good(i) (n) is a component of the vector of LSF 
parameters from the n+2 th previous good frame (i.e. the good 
frame that precedes the present bad frame by n+1 frames) . 
Finally, the quantity adaptive_mean_LSF (i) is the mean 
10© (arithmetic average) of the previous good LSF vectors (i.e. it 

is a component of a vector quantity, each component being a mean 

H of the corresponding components of the previous good LSF 

S3 

yi vectors) . 

q It has been demonstrated that the adaptive mean method of 

15® the invention improves the subjective quality of synthesized 

o 

\j speech compared to the method of the prior art . The 
W demonstration used simulations where speech is transmitted 
p through an error- inducing communication channel. Each time a 
rJ? bad frame was detected, the spectral error was calculated. The 
20 spectral error was obtained by subtracting, from the original 

spectrum, the spectrum that was used for concealing during the 
bad frame. The absolute error is calculated by taking the 
absolute value from the spectral error. Figs. 4 and 5 show the 
histograms of absolute deviation error of LSFs for the prior art 
25 and for the invented method, respectively. The optimal error 

concealment has an error close to zero, i.e. when the error is 
close to zero, the spectral parameters used for concealing are 
very close to the original (corrupted or lost) spectral 
parameters. As can be seen from the histograms of Figs. 4 and 
30 5, the adaptive mean method of the invention (Fig. 5) conceals 
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errors better than the prior-art method (Fig. 4) during 
stationary speech sequences. 

As mentioned above, the spectral coefficients of non- 
stationary signals (or, less precisely, unvoiced signals) 
5 fluctuate between adjacent frames, as indicated in Fig. 3, which 

is a graph illustrating LSFs of adjacent frames in case of non- 
stationary speech, the Y-axis being frequency and the X-axis 
being frames. In such a case, the optimal concealment method is 
not the same as in the case of stationary speech signal. For 
10~f non- stationary speech, the invention provides concealment for 
JiQ bad (corrupted or lost) non- stationary speech segments according 
jL to the following algorithm (the non- stationary algorithm) : 

W For i = 0 to N-l: 

Q 

Q partly _adapt ive_mean_LSF (i) 
15^ = P*mean_LSF(i) + (1-0) * adapt ive_mean_LSF (i) ; (2.3) 

y, LSF ql (i) 

Q = a*past_LSF_qood(i) (0) + (1-ct) *partly_adaptive_mean_LSF (i) ; (2.2) 

Q 

LSF_q2 (i) = LSFjql (i) ; 

where N is the order of the LP filter, where a is typically 
20 approximately 0.90, where LSF_ql (i) and LSF_q2(i) are two sets 

of LSF vectors for the current frame as in equation (2.1), where 
past_LSF_q (i) is LSF_q2(i) from the previous good frame, where 
partly_adaptive_mean_LSF ( i ) is a combination of the adaptive 
mean LSF vector and the average LSF vector, and where 
2 5 adapt ive_mean_LSF (i) is the mean of the last K good LSF vectors 

(which is updated when BFI is not set) , and where mean^LSF (i) is 
a constant average LSF and is generated during the design 
process of the codec being used to synthesize speech; it is an 
average LSF of some speech database. The parameter/? is 
30 typically approximately 0.75, a value used to express the extent 
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to which the speech is stationary as opposed to non- stationary . 
(It is sometimes calculated based on the ratio of the long-term 
prediction excitation energy to the fixed codebook excitation 
energy, or more precisely, using the formula 

1 + voiceFactor 
P ^ 2 

where 

energy pa* -energy > innovation 



voiceFactor - 



energy pitch + energy innovation 



in which energy p ± tch is the energy of pitch excitation and 
energy inno^tion is the energy of the innovation code excitation. 



10UJ When most of the energy is in long-term prediction excitation, 

B 

£L the speech being decoded is mostly stationary. When most of the 



energy is in the fixed codebook excitation, the speech is mostly 
non-stationary. ) 

u A 3 -"' 

p For P = I/O, equation (2.3) reduces to equation (1.0), 

15'jTj which is the prior art. For p = 0.0, equation (2.3) reduces to 
the equation/ (2 . 1) , which is used by the present invention for 
stationary segments. For complexity sensitive implementations 
k \ (in applications where it is important to keep complexity to a 
reasonable level) , p can be fixed to some compromise value, e.g. 
20 0.75, forf both stationary and non- stationary segments. Spectral 

parameter concealment specifically for lost frames. 



In case of a lost frame, only the information of past 
spectral parameters is available. The substituted spectral 
parameters are calculated according to a criterion based on 
25 parameter histories of for example spectral and LTP (long-term 

prediction) values; LTP parameters include LTP gain and LTP lag 
value. LTP represents the correlation of a current frame to a 
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previous frame. For example, the criterion used to calculate 
the substituted spectral parameters can distinguish situations 
where the last good LSFs should be modified by an adaptive LSF 
mean or, as in the prior art, by a constant mean. 

5 Alternative spectral parameter concealment specifically for 

corrupted frames 

When a speech frame is corrupted (as opposed to lost) , the 
concealment procedure of the invention can be further optimized. 
In such a case, the spectral parameters can be completely or 
10 ~i partially correct when received in the speech decoder. For 

%Q example, in a packet -based connection (as in an ordinary TCP/IP 

Mi 

Sp, Internet connection) , the corrupted frames concealment method is 

W usually not possible because with TCP/IP type connections 

£3 

S[ usually all bad frames are lost frames, but for other kinds of 



IBs connections, such as in the circuit switched GSM or EDGE 
Cjj connections, the corrupted frames concealment method of the 

invention can be used. Thus, for packet-switched connections, 

Q 

p the following alternative method cannot be used, but for 
^ circuit-switched connections, it can be used, since in such 
20 connections bad frames are at least sometimes (and in fact 

usually) only corrupted frames. 

According to the specifications for GSM, a bad frame is 
detected when a BFI flag is set following a CRC check or other 
error detection mechanism used in the channel decoding process. 

25 Error detection mechanisms are used to detect errors in the 

subjectively most significant bits, i.e. those bits having the 
greatest effect on the quality of the synthesized speech. In 
some prior art methods, these most significant bits are not used 
when a frame is indicated to be a bad frame. However, a frame 

30 may have only a few bit errors (even one being enough to set the 

BFI flag) , so the whole frame could be discarded even though 
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most of the bits are correct. A CRC check detects simply 
whether or not a frame has erroneous frames, but makes no 
estimate of the BER (bit error rate) . Fig. 6 illustrates how 
bits are classified according to the prior art when a bad frame 
is detected. In Fig. 6, a single frame is shown being 
communicated, one bit at a time (from left to right) , to a 
decoder over a communications channel with conditions such that 
some bits of the frame included in a CRC check are corrupted, 
and so the BFI is set to one. 

As can be seen from Fig. 6, even when a received frame 
sometimes contains many correct bits (the BER in a frame usually 
being small when channel conditions are relatively good) , the 
prior art does not use them. In contrast, the present invention 
tries to estimate if the received parameters are corrupted and 
if they are not, the invented method uses them. 



Table 1 demonstrates the idea behind the corrupted frame 
concealment according to the invention in the example of an 
adaptive multi-rate (AMR) wideband (WB) decoder. 





C/l [dB] 




10 


9 


8 


7 


6 


BER 


3.72% 


4.58% 


5.56% 


6.70% 


7.98% 


FER 


0.30% 


0.74% 


1.62% 


3.45% 


7.16% 


Correct spectral parameter indexes 


84% 


77% 


68% 


64% 


60% 


Totally corrcet spectrum 


47% 


38% 


32% 


27% 


24% 



Table 1. Percentage of correct spectral parameters in a corrupted 
speech frame . 



In case of an AMR WB decoder, mode 12.65 kbit/s is a good choice 
to use when the channel carrier to interference ratio (C/I) is 
in the range from approximately 9 dB to 10 dB. From Table 1, it 
can be seen that in case of GSM channel conditions with a C/l in 
the range 9 to 10 dB using a GMSK (Gaussian Minimum-Shift 
Keying) modulation scheme, approximately 35-50% of received bad 
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frames have a totally correct spectrum. Also, approximately 75- 
85% of all bad frame spectral parameter coefficients are 
correct. Because of the localized nature of the spectral 
impact, as mentioned earlier, spectral parameter information can 
5 be used in the bad frames. Channel conditions with a C/I in the 

range 6-8 dB or less are so poor that the 12.65 kbit/s mode 
should not be used; instead, some other, lower mode should be 
used. 

The basic idea of the present invention in the case of 
10 w corrupted frames is that according to a criterion (described 
:J3 below) , channel bits from a corrupt frame are used for decoding 
g the corrupt frame. The criterion for spectral coefficients is 
UJ based on the past values of the speech parameters of the signal 
35 being decoded. When a bad frame is detected, the received LSFs 
15s or other spectral parameters communicated over the channel are 
rj used if the criterion is met; in other words, if the received 

W LSFs meet the criterion, they are used in decoding just as they 

Q 

m would be if the frame were not a bad frame. Otherwise, i.e. if 
H" the LSFs from the channel do not meet the criterion, the 

2 0 spectrum for a bad frame is calculated according to the 

concealment method described above, using equations (2.1) or 
(2.2). The criterion for accepting the spectral parameters can 
be implemented by using for example a spectral distance 
calculation such as a calculation of the so-called Itakura-Saito 

25 spectral distance. (See, for example, page 32 9 of Discrete-Time 

Processing of Speech Signals by John R Deller Jr, John H.L. 
Hansen, and John G. Proakis, , published by IEEE Press, 2000.) 

The criterion for accepting the spectral parameters from 
the channel should be very strict in the case of a stationary 
30 speech signal. As shown in Fig. 3, the spectral coefficients 

are very stable during a stationary sequence (by definition) so 
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that corrupted LSFs (or other speech parameters) of a stationary 
speech signal can usually be readily detected (since they would 
be distinguishable from uncorrupted LSFs on the basis that they 
would differ dramatically from the LSFs of uncorrupted adjacent 
frames). On the other hand, for a non-stationary speech signal, 
the criterion need not be so strict; the spectrum for a non- 
stationary speech signal is allowed to have a larger variation. 
For a non- stationary speech signal, the exactness of the correct 
spectral parameters is not strict in respect to audible 
artifacts, since for non-stationary speech (i.e. more or less 
unvoiced speech) , no audible artifacts are likely regardless of 
whether or not the speech parameters are correct . In other 
words, even if bits of the spectral parameters are corrupted, 
they can still be acceptable according to the criterion, since 
spectral parameters for non- stationary speech with some corrupt 
bits will not usually generate any audible artifacts. According 
to the invention, the subjective quality of the synthesized 
speech is to be diminished as little as possible in case of 
corrupted frames by using all the available information about 
the received LSFs, and by selecting which LSFs to use according 
to the characteristics of the speech being conveyed. 

Thus, although the invention includes a method for 
concealing corrupted frames, it also comprehends as an 
alternative using a criterion in case of a corrupted frame 
conveying non- stationary speech, which, if met, will cause the 
decoder to use the corrupted frame as is; in other words, even 
though the BFI is set, the frame will be used. The criterion is 
in essence a threshold used to distinguish between a corrupted 
frame that is useable and one that is not; the threshold is 
based on how much the spectral parameters of the corrupted frame 
differ from the spectral parameters of the most recently 
received good frames. 
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The use of possible corrupted spectral parameters is 
probably more sensitive to audible artifacts than use of other 
corrupted parameters, such as corrupted LTP lag values. For 
this reason, the criterion used to determine whether or not to 
5 use a possibly corrupt spectral parameter should be especially 

reliable. In some embodiments, it is advantageous to use as the 
criterion a maximum spectral distance (from a corresponding 
spectral parameter in a previous frame, beyond which the suspect 
spectral parameter is not to be used); in such an embodiment, 
10^- the well-known Itakura-Saito distance calculation could be used 
*fil to quantify the spectral distance to be compared with the 
^ threshold. Alternatively, fixed or adaptive statistics of 

CO spectral parameters could be used for determining whether or not 

Id 

p to use possibly corrupted spectral parameters. Also other 
15*3 speech parameters, such as gain parameters, could be used for 
jp% generating the criterion. (If the other speech parameters are 
not drastically different in the current frame, compared to the 

yj 

□ values in the most recent good frame, then the spectral 

P parameters are probably okay to use, provided the received 

20 spectral parameters also meet the criteria. In other words, 

other parameters, such as LTP gain, can be used as an additional 
component to set proper criteria to determine whether or not to 
use the received spectral parameters. The history of the other 
speech parameters can be used for improved recognition of speech 

25 characteristic. For example, the history can be used to decide 

whether the decoded speech sequence has a stationary or non- 
stationary characteristic. When the properties of the decoded 
speech sequence are known, it is easier to detect possibly 
correct spectral parameters from the corrupted frame and it is 

30 easier to estimate what kind of spectral parameter values are 

expected to have been conveyed in a received corrupted frame.) 
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According to the invention in the preferred embodiment, and 
now referring to Fig. 8, the criterion for determining whether 
or not to use a spectral parameter for a corrupted frame is 
based on the notion of a spectral distance, as mentioned above. 
5 More specifically, to determine whether the criterion for 

accepting the LSF coefficients of a corrupted frame is met, a 
processor of the receiver executes an algorithm that checks how 
much the LSF coefficients have moved along the frequency axis 
compared to the LSF coefficients of the last good frame, which 

10_ is stored in an LSF buffer, along with the LSF coefficients of 

O 

yEj some predetermined number of earlier, most recent frames. 

y3 

• m The criterion according to the preferred embodiment 

involves making one or more of four comparisons: an inter-frame 

O comparison, an intra-frame comparison, a two-point comparison, 

O 

15^ and a single-point comparison. 

Q 

vjj In the first comparison, the inter- frame comparison, the 

IU differences between LSF vector elements in adjacent frames of 

[jSj the corrupted frame are compared to the corresponding 

H 1 differences of previous frames. The differences are determined 

20 as follows: 



rf.(0 = |A,-,(0-A,(0|, 1</<P-1, 



where P is the number of spectral coefficients for a frame, 
L n (i) is the i th LSF element of corrupted frame, and Ln-xd) is 
the i th LSF element of the frame before corrupted frame. The 
25 LSF element, L n (i) , of the corrupted frame is discarded if the 

difference, d n (i) , is too high compared to d n -i(i), d n . 2 (i) , d n „ 
jc(i) , where k is the length of the LSF buffer. 

The second comparison, the intra-frame comparison, is a 
comparison of difference between adjacent LSF vector elements in 
3 0 the same frame. The distance between the candidate i th LSF 
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element, L n (i) t of the 11 th frame and the (i-l)** 1 LSF element, L n _ 
x(i) , of the n th frame is determined as follows: 

e n {i) = L n {i-\)-L n {i), 2<i<P-\, 

where P is the number of spectral coefficients and e n (i) is the 
5 distance between LSF elements . Distances are calculated between 

all LSF vector elements of the frame. One or another or both of 
the LSF elements L n (i) and L n (i-1) will be discarded if the 
difference, e n (i) , is too large or too small compared to e n .!(i) , 
e n , 2 (i) /■»/ e n - k (i) . 

o 

10yp The third comparison, the two-point comparison, determines 

whether a crossover has occurred involving the candidate LSF 

£0 element L n (i) , i.e. whether an element L n (i-1) that is lower in 

Id 

order than the candidate element has a larger value than the 
O candidate LSF element L n (i) . A crossover indicates one or more 
15jfU highly corrupted LSF values. All crossing LSF elements are 

"4 usually discarded. 

W 

O The fourth comparison, the single-point comparison, 

1^ compares the value of the candidate LSF vector element, L n (i) to 

a minimum LSF element, L min (i) , and to a maximum LSF element, 
20 L max (i) , both calculated from the LSF buffer, and discards the 

candidate LSF element if it lies outside the range bracketed by 

the minimum and maximum LSF elements. 

If an LSF element of a corrupted frame is discarded (based 
on the above criterion or otherwise) , then a new value for the 
25 LSF element is calculated according to the algorithm using 

equation (2.2) . 

Referring now to Fig. 7, a flowchart of the overall method 
of the invention is shown, indicating the different provisions 
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for stationary and non- stationary speech frames, and for 
corrupted as opposed to lost non- stationary speech frames. 



Discussion 

The invention can be applied in a speech decoder in either 
5 a mobile station or a mobile network element. It can also be 

applied to any speech decoder used in a system having an 
erroneous transmission channel. 



P Scope of the Invention 

"0 It is to be understood that the above-described 

lOjfg arrangements are only illustrative of the application of the 

W principles of the present invention. In particular, it should 

£3 

q be understood that although the invention has been shown and 

? described using line spectrum pairs for a concrete illustration, 

P 

\j the invention also comprehends using other, equivalent 

i^ij parameters, such as immittance spectral pairs. Numerous 

p modifications and alternative arrangements may be devised by 

r "~ those skilled in the art without departing from the spirit and 
scope of the present invention, and the appended claims are 
intended to cover such modifications and arrangements. 
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